21 research outputs found
Semantic Object Parsing with Local-Global Long Short-Term Memory
Semantic object parsing is a fundamental task for understanding objects in
detail in computer vision community, where incorporating multi-level contextual
information is critical for achieving such fine-grained pixel-level
recognition. Prior methods often leverage the contextual information through
post-processing predicted confidence maps. In this work, we propose a novel
deep Local-Global Long Short-Term Memory (LG-LSTM) architecture to seamlessly
incorporate short-distance and long-distance spatial dependencies into the
feature learning over all pixel positions. In each LG-LSTM layer, local
guidance from neighboring positions and global guidance from the whole image
are imposed on each position to better exploit complex local and global
contextual information. Individual LSTMs for distinct spatial dimensions are
also utilized to intrinsically capture various spatial layouts of semantic
parts in the images, yielding distinct hidden and memory cells of each position
for each dimension. In our parsing approach, several LG-LSTM layers are stacked
and appended to the intermediate convolutional layers to directly enhance
visual features, allowing network parameters to be learned in an end-to-end
way. The long chains of sequential computation by stacked LG-LSTM layers also
enable each pixel to sense a much larger region for inference benefiting from
the memorization of previous dependencies in all positions along all
dimensions. Comprehensive evaluations on three public datasets well demonstrate
the significant superiority of our LG-LSTM over other state-of-the-art methods.Comment: 10 page
Diffusion Shape Prior for Wrinkle-Accurate Cloth Registration
Registering clothes from 4D scans with vertex-accurate correspondence is
challenging, yet important for dynamic appearance modeling and physics
parameter estimation from real-world data. However, previous methods either
rely on texture information, which is not always reliable, or achieve only
coarse-level alignment. In this work, we present a novel approach to enabling
accurate surface registration of texture-less clothes with large deformation.
Our key idea is to effectively leverage a shape prior learned from pre-captured
clothing using diffusion models. We also propose a multi-stage guidance scheme
based on learned functional maps, which stabilizes registration for large-scale
deformation even when they vary significantly from training data. Using
high-fidelity real captured clothes, our experiments show that the proposed
approach based on diffusion models generalizes better than surface registration
with VAE or PCA-based priors, outperforming both optimization-based and
learning-based non-rigid registration methods for both interpolation and
extrapolation tests.Comment: Project page:
https://www-users.cse.umn.edu/~guo00109/projects/3dv2024